Performance Evaluation of Cascade ALU Architecture for Asynchronous Super-Scalar Processors
نویسندگان
چکیده
Current out-of-order architectures have the critical path in the memory structure. Since the memory access delay mainly consists of wire delays, the feature size reduction will make little contribution on the critical path reduction. Therefore, the performance of the out-of-order architecture will not improve in spite of an expected advance in future technologies. To solve this problem, we present a novel architecture, called the Cascade ALU architecture, in which the critical path lies in the ALU. Since the ALU latency mainly consists of gate delays, the cycle time can be reduced with feature size reduction. In the Cascade ALU architecture, the instruction execution latency varies depending on executed instructions. Thus, an asynchronous implementation is suitable for the Cascade ALU. Since asynchronous handshake overhead may be too large to enhance the processor performance with the Cascade ALU. We show a method for hiding the handshake overhead, based on the fine-grain pipelining. Finally, we show the evaluation results that demonstrate the Cascade ALU architecture can achieve a good performance scalability in the ALU latency reduction.
منابع مشابه
A Simulator for the Hydra CMP
In recent years, Single Chip Multiprocessors have been gaining ground as alternatives to superscalar processor architectures. Comparisons [3] between CMPs and Superscalar Processors argue the case for chip multiprocessors through the results that super scalar architectures outperform CMPs of comparable die size and cost by only a small margin where coarse grained parallelism is not available. O...
متن کاملA CMOS VLSI Implementation of an Asynchronous ALU
A CMOS self-timed ALU has been developed as part of an asynchronous implementation of the ARM microprocessor. This unit exploits the data dependency inherent in many arithmetic operations to enable a small, simple ALU to deliver a mean performance comparable with that of a more sophisticated synchronous one with consequent reductions in both silicon area and electrical power consumption. The se...
متن کاملModeling and Performance Evaluation of Multi-Processors Organization with Shared Memories
This paper is primarily concerned with theoretical evaluation of the performance of multiprocessors system. A markovian waiting line model has been developed for various different multi-processors configurations, with shared memory. The system is analysed at the request level rather than job level.
متن کاملComparison of Architecture Processors Focusing on ALU and Floating Point Unit Designs
This paper is to evaluate and compare some of the fundamental metrics for selected ALU designs, covering both time and space complexity. The references examined consists of papers from 1999 to present. A total of ten different designs on ALU and Floating points are examined and compared. Some of the key components evaluated are clock rate, memory capacity, components, floating point, ALU, FPGA,...
متن کامل